-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-20918 Add cursor-based low allocation optimized compaction implementation #4402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
patch by Nitsan Wakart; reviewed by TBD for CASSANDRA-TBD - EPPv1 - all new code - Cursor compaction integration - JMH benchmarks for compaction and cursor impls - EPPv1 - New tests - Existing tests tweaks for new code - [revert?] change the default partitioner to expand testing of new code - [revert?] test data used some benchmarks - [revert?] jmh tweak GC settings for stability - [revert?] javadoc typos, marking unused params - [revert?] clarifying comment - [revert?] toString improvement - [revert?] remove spurious keywords - [revert?] marking metadata collection - [revert?] cursor verifier - Exclude SAI and counter column - Exclude BTI and legacy versions - Temporarily skip very long running test
src/java/org/apache/cassandra/db/compaction/CompactionTask.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionTask.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionTask.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be stated that this approach that bundles all the steps of the processing in one single file will be quite difficult to maintain and keep in sync with the combination of iterators and transformations that we use in other parts of the code such as the query path. However, once we have reached a point of stability for a piece of functionality where we do not expect it to change significantly for a long time, it does makes sense to unpack the code and present it in a way that makes its execution as direct as possible, and this patch is a good such representation of the compaction process.
Personally, I am very unhappy about switching to mutable, pooled and reused objects, which are significantly more unwieldy and error prone, especially in contexts where concurrent access can occur. It seems this is becoming a necessity if we need to achieve acceptable performance with the current state of our heap usage, but we still need to very carefully separate the mutable versions of concepts from the immutable ones used throughout the code base. Suddenly making a DeletionTime mutable is not an acceptable change.
First batch of targeted comments below, mainly going over CompactionCursor.java.
src/java/org/apache/cassandra/db/compaction/CompactionTask.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionTask.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionCursor.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionCursor.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionCursor.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionCursor.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionCursor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next batch of comments.
| } | ||
| if (!UnfilteredSerializer.hasAllColumns(flags)) | ||
| { | ||
| // TODO: re-implement GC free |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataStax's branch has an implementation of it.
src/java/org/apache/cassandra/io/sstable/format/SortedTableWriter.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/writers/DefaultCompactionWriter.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java
Show resolved
Hide resolved
src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java
Outdated
Show resolved
Hide resolved
… elsewhere using seek
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next batch of comments.
| import org.apache.cassandra.dht.Murmur3Partitioner; | ||
| import org.jctools.util.UnsafeAccess; | ||
|
|
||
| public class ReusableLongToken extends Murmur3Partitioner.LongToken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This shouldn't need to be public.
| import org.apache.cassandra.utils.ByteArrayUtil; | ||
| import org.apache.cassandra.utils.ByteBufferUtil; | ||
|
|
||
| public class ReusableDecoratedKey extends DecoratedKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is pretty hacky. It shouldn't be hard to move the support for reusable tokens to the partitioner (throwing exceptions for all except Murmur and local).
src/java/org/apache/cassandra/io/sstable/AbstractSSTableSimpleWriter.java
Show resolved
Hide resolved
src/java/org/apache/cassandra/io/sstable/SSTableCursorReader.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/io/sstable/SSTableCursorReader.java
Outdated
Show resolved
Hide resolved
| appendBIGIndex(partitionKey, partitionKeyLength, partitionStart, headerLength, partitionDeletionTime, partitionEnd); | ||
| } | ||
|
|
||
| private void appendBIGIndex(byte[] key, int keyLength, long partitionStart, int headerLength, DeletionTime partitionDeletionTime, long partitionEnd) throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not easy to modify and reuse the index building code from BigFormatPartitionWriter? The duplication here seems quite unnecessary.
| private final int indexBlockThreshold; | ||
|
|
||
|
|
||
| private SSTableCursorWriter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be split into a common SortedTableCursorWriter, with format-specific subclasses that instantiate the index builders it uses, and placed into the correct per-format packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only BIG is supported, so I held off here. I think the index code split can be postponed to the time when the BTI format is supported. It may also be interesting to explore splitting the SSTable write phase and indexing phase so as to allow more flexibility in composing the phases (e.g. index on the fly/index per partition write/index at end of write, parallel index pass etc).
…nMerge` and return true when partitions remain
Javadoc for bubbleInsertToPreSorted
Simplify deletion merging loop, and some refactoring of names
src/java/org/apache/cassandra/db/compaction/CompactionTask.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionCursor.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/compaction/CompactionCursor.java
Outdated
Show resolved
Hide resolved
| { | ||
| if (DatabaseDescriptor.enableCursorCompaction()) { | ||
| try { | ||
| if (CompactionCursor.isSupported(scanners, controller)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still log whether we are doing cursor compaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now debug logged in the isSupported method, with a reason for the rejection. Do you think further logging is required?
patch by Nitsan Wakart; reviewed by TBD for CASSANDRA-TBD
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira